Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Sampling in cuGraph-PyG #4384

Merged
merged 113 commits into from
May 30, 2024

Conversation

alexbarghi-nv
Copy link
Member

@alexbarghi-nv alexbarghi-nv commented May 1, 2024

Distributed sampling in cuGraph-PyG. Also renames the existing API to clarify that it is dask based.
Adds a dependency on tensordict for cuGraph-PyG which supports the new TensorDictFeatureStore.
Also no longer installs torch-cluster and torch-spline-conv in CI for testing since that results in an ImportError and neither of those packages are needed.

Requires PyG 2.5. Should be merged after #4335

Merge after #4355

Closes #4248
Closes #4249
Closes #3383
Closes #3942
Closes #3836
Closes #4202
Closes #4051
Closes #4326
Closes #4252
Partially addresses #3805

seunghwak and others added 30 commits April 2, 2024 17:28
@alexbarghi-nv alexbarghi-nv requested review from a team as code owners May 17, 2024 20:31
Copy link
Member

@tingyu66 tingyu66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with some comments and questions. The overall logic looks reasonable and trust you with matching the detailed interface of PyG's native Graph- and FeatureStore.

python/cugraph-pyg/cugraph_pyg/examples/gcn_dist_snmg.py Outdated Show resolved Hide resolved
sz = torch.tensor(num_vertices[vtype], device="cuda")
torch.distributed.all_reduce(sz, op=torch.distributed.ReduceOp.MAX)
num_vertices[vtype] = int(sz)
return num_vertices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cache num_vertices and num_edges since they require collective communications?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cache the offsets, which are more frequently used, but I might add caching the vertex and edge counts if they are used more often. So maybe in the future.

@BradReesWork
Copy link
Member

/merge

@alexbarghi-nv
Copy link
Member Author

CI failed with bus error again, I think due to OOM. I just pushed a commit that skips that test.

@BradReesWork
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 797a036 into rapidsai:branch-24.06 May 30, 2024
132 checks passed
rapids-bot bot pushed a commit that referenced this pull request May 30, 2024
Reimplements the WG feature store for PyG using the `FeatureStore` interface.
Merge after #4384 

Closes rapidsai/wholegraph#47
Closes #4399

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Seunghwa Kang (https://github.com/seunghwak)
  - Tingyu Wang (https://github.com/tingyu66)
  - Ralph Liu (https://github.com/nv-rliu)

Approvers:
  - Tingyu Wang (https://github.com/tingyu66)
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Brad Rees (https://github.com/BradReesWork)
  - Ray Douglass (https://github.com/raydouglass)

URL: #4432
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment